Goto

Collaborating Authors

 Merced



MTR-VP: Towards End-to-End Trajectory Planning through Context-Driven Image Encoding and Multiple Trajectory Prediction

Keskar, Maitrayee, Trivedi, Mohan, Greer, Ross

arXiv.org Artificial Intelligence

We present a method for trajectory planning for autonomous driving, learning image-based context embeddings that align with motion prediction frameworks and planning-based intention input. Within our method, a ViT encoder takes raw images and past kinematic state as input and is trained to produce context embeddings, drawing inspiration from those generated by the recent MTR (Motion Transformer) encoder, effectively substituting map-based features with learned visual representations. MTR provides a strong foundation for multimodal trajectory prediction by localizing agent intent and refining motion iteratively via motion query pairs; we name our approach MTR-VP (Motion Transformer for Vision-based Planning), and instead of the learnable intention queries used in the MTR decoder, we use cross attention on the intent and the context embeddings, which reflect a combination of information encoded from the driving scene and past vehicle states. We evaluate our methods on the Waymo End-to-End Driving Dataset, which requires predicting the agent's future 5-second trajectory in bird's-eye-view coordinates using prior camera images, agent pose history, and routing goals. We analyze our architecture using ablation studies, removing input images and multiple trajectory output. Our results suggest that transformer-based methods that are used to combine the visual features along with the kinetic features such as the past trajectory features are not effective at combining both modes to produce useful scene context embeddings, even when intention embeddings are augmented with foundation-model representations of scene context from CLIP and DINOv2, but that predicting a distribution over multiple futures instead of a single future trajectory boosts planning performance.


OmniLens++: Blind Lens Aberration Correction via Large LensLib Pre-Training and Latent PSF Representation

Jiang, Qi, Qian, Xiaolong, Gao, Yao, Sun, Lei, Yang, Kailun, Yi, Zhonghua, Li, Wenyong, Yang, Ming-Hsuan, Van Gool, Luc, Wang, Kaiwei

arXiv.org Artificial Intelligence

Emerging deep-learning-based lens library pre-training (LensLib-PT) pipeline offers a new avenue for blind lens aberration correction by training a universal neural network, demonstrating strong capability in handling diverse unknown optical degradations. This work proposes the OmniLens++ framework, which resolves two challenges that hinder the generalization ability of existing pipelines: the difficulty of scaling data and the absence of prior guidance characterizing optical degradation. To improve data scalability, we expand the design specifications to increase the degradation diversity of the lens source, and we sample a more uniform distribution by quantifying the spatial-variation patterns and severity of optical degradation. In terms of model design, to leverage the Point Spread Functions (PSFs), which intuitively describe optical degradation, as guidance in a blind paradigm, we propose the Latent PSF Representation (LPR). The VQVAE framework is introduced to learn latent features of LensLib's PSFs, which is assisted by modeling the optical degradation process to constrain the learning of degradation priors. Experiments on diverse aberrations of real-world lenses and synthetic LensLib show that OmniLens++ exhibits state-of-the-art generalization capacity in blind aberration correction. Beyond performance, the AODLibpro is verified as a scalable foundation for more effective training across diverse aberrations, and LPR can further tap the potential of large-scale LensLib. The source code and datasets will be made publicly available at https://github.com/zju-jiangqi/OmniLens2.



An ensemble diversity approach to supervised binary hashing

Miguel A. Carreira-Perpinan, Ramin Raziperchikolaei

Neural Information Processing Systems

Binary hashing is a well-known approach for fast approximat e nearest-neighbor search in information retrieval. Much work has focused on af finity-based objective functions involving the hash functions or binary codes. The se objective functions encode neighborhood information between data points and ar e often inspired by manifold learning algorithms. They ensure that the hash fun ctions differ from each other through constraints or penalty terms that encourage c odes to be orthogonal or dissimilar across bits, but this couples the binary varia bles and complicates the already difficult optimization. W e propose a much simpler ap proach: we train each hash function (or bit) independently from each other, b ut introduce diversity among them using techniques from classifier ensembles. Surp risingly, we find that not only is this faster and trivially parallelizable, b ut it also improves over the more complex, coupled objective function, and achieves sta te-of-the-art precision and recall in experiments with image retrieval.





A fast, universal algorithm to learn parametric nonlinear embeddings

Miguel A. Carreira-Perpinan, Max Vladymyrov

Neural Information Processing Systems

Nonlinear embedding algorithms such as stochastic neighbo r embedding do dimensionality reduction by optimizing an objective functio n involving similarities between pairs of input patterns. The result is a low-dimensi onal projection of each input pattern. A common way to define an out-of-sample mappin g is to optimize the objective directly over a parametric mapping of the inpu ts, such as a neural net. This can be done using the chain rule and a nonlinear opti mizer, but is very slow, because the objective involves a quadratic number of t erms each dependent on the entire mapping's parameters. Using the method of auxi liary coordinates, we derive a training algorithm that works by alternating ste ps that train an auxiliary embedding with steps that train the mapping. This has two advantages: 1) The algorithm is universal in that a specific learning algori thm for any choice of embedding and mapping can be constructed by simply reusing e xisting algorithms for the embedding and for the mapping. A user can then try poss ible mappings and embeddings with less effort.